02. EHR Dataset Levels
EHR Dataset Levels
ND320 AIHCND C01 L03 A02 EHR Dataset Levels
EHR Dataset Levels Key Points
With EHR datasets, there are three levels.
- Line
- Encounter
- Longitudinal.
These levels are extremely important in healthcare data, and being able to identify and work with data at the correct level will ensure that you start with the correct data type and dataset to feed to your models.
Line Level
Line Level: A denormalized or disaggregated representation of all the things that might happen in a medical visit or encounter.
Think of a visit to the doctor for bronchitis.
Your line-level data entries could be:
- A diagnosis code of bronchitis
- A medication code for a cough suppressant
- A procedure code for a test for bronchitis
and a line could be a diagnosis or medication that was prescribed. Another line could include information on a lab test that the doctor ordered for informing the diagnosis.
Encounter Level
Encounter Level: Also known as the visit level, which is the aggregated information from the previously mentioned line level for one encounter. This information can be collapsed into a single row or arrays.
Using the example above, the encounter level for that visit would include the diagnosis code of bronchitis, medication code for a cough suppressant, and the procedure code for a test for bronchitis in one array or list.
Longitudinal Level
Longitudinal Level: Also known as the patient level view. This level aggregates the patient history and can show how the culmination of visits/encounter lead to some clinical impact.
Continuing with our example above if the patient contracts bronchitis often, over a series of years, we might gain some insights into a possible autoimmune disease or know exactly what to prescribe the patient when they start seeing symptoms.
Now that you have a basic understanding of the different levels we'll explore them a bit more with examples.
EHR Dataset Levels Continued
As stated above, EHR records are commonly represented at one of the following three levels: line, encounter, and longitudinal levels. Let's review this one more time with the visual above.
Patient A had an Encounter on January 20th of 2019, where they had 3 different codes produced. Patient A also had another encounter on March 20th of 2019 with its own set of codes. All together these encounters and line-level codes add up to the Longitudinal Level of knowledge we have on that particular patient.
The Longitudinal view is an important level for aggregating the patient history and is where you connect information across visits/encounters and rolls up information to determine trends across time.
Why are EHR levels important?
Using the wrong EHR dataset level can lead to major errors with building models because data preparation is done with faulty assumptions and lead to serious error.
For example, a common cause is the duplication of encounter information when you take a line-level dataset and treat it as an encounter level dataset.
Example:
A particular encounter might have 50 lines and that might be treated inadvertently as 50 distinct encounters when it is actually one encounter. This has the effect of upsampling certain common values for that encounter in your dataset, but also creates a great deal of noise since those 50 lines might have only slight differences.
Further, selecting the wrong encounter from the patient record can often occur and there might be a case where you only want the earliest or latest visit or state for a patient or time step for your model. This can cause many issues that might not become apparent until the modeling or deployment phases of your project
How do you know the dataset level for your data?
This is actually fairly easy if you collect some key metrics from your dataset and there are different ways to do this but I provided a few simple ways to do below.
- The total number of rows in the dataset. This is a simple calculation with
len()
- The number of unique encounters or visits. You can calculate this by finding the field(s) that give the identity of a unique encounter using
nunique()
.
Example
total_rows = len(fake_df)
total_encounters = fake_df['encounter'].nunique()
From here we do some simple calculations to figure out our dataset level.
If the total number of rows is greater than the number of unique encounters, it is at the line level.
Again using our example from above:
total_rows = len(fake_df)
total_unique_encounters = len(fake_df['encounter'].nunique())
if the output was
total_rows = 43464
total_unique_encounters = 3259
We could find out using
print(total_rows > total_unique_encounters)
would evaluate toTrue
Therefore this dataset would be at the line level.
If the total number of rows is equal to the number of unique encounters, it is at the encounter level.
Again using our example from above:
total_rows = len(fake_df)
total_unique_encounters = len(fake_df['encounter'].nunique())
if the output was
total_rows = 3464
total_unique_encounters = 3464
We could find out using
print(total_rows == total_unique_encounters)
would evaluate toTrue
Therefore this dataset would be at the encounter level.
Longitudinal Level
For the longitudinal or patient level, you will see multiple encounters grouped under a patient and you might not even see the encounter id field since this information is collapsed/aggregated under a unique patient id. In this case, the total number of rows should equal the total number of unique patients.
EHR Dataset Levels
QUIZ QUESTION::
Match the correct Dataset Level to example or definition.
ANSWER CHOICES:
Example/Definition |
Dataset Level |
---|---|
Gives the whole view of the patient. |
|
027004Z (AKA Heart Surgery) |
|
Has all of the codes for a given visit to a healthcare provider |
|
Is the code for a procedure, medication, or diagnosis |
|
|
|
Would be grouped by |
SOLUTION:
Example/Definition |
Dataset Level |
---|---|
Has all of the codes for a given visit to a healthcare provider |
|
|
|
Gives the whole view of the patient. |
|
Would be grouped by |
|
027004Z (AKA Heart Surgery) |
|
Is the code for a procedure, medication, or diagnosis |
|
027004Z (AKA Heart Surgery) |
|
Is the code for a procedure, medication, or diagnosis |
|
Gives the whole view of the patient. |
|
Would be grouped by |
|
Has all of the codes for a given visit to a healthcare provider |
|
|
Line Level
SOLUTION:
`total_rows > total_unique_encounters` = `True`Encounter Level Quiz
SOLUTION:
- `total_rows == total_unique_encounters` = `True`
- total rows = 745,838, unique encounters = 745,838
Reflect
QUESTION:
Why is it so important to make sure your dataset is a the correct level before using it to build a model?
ANSWER:
The incorrect dataset level can lead to major errors with building models because data preparation was done with faulty assumptions. This could lead duplication of encounter information.
Also the selecting the wrong or random encounter from the patient record can have a large negative effect on your model that you won't see until deployment.
These are only a few potential problems, you may come up with some others as well! Thanks for completing this.